OCR For Printed Urdu Script Using Feed Forward Neural Network
نویسندگان
چکیده
This paper deals with an Optical Character Recognition system for printed Urdu, a popular Pakistani/Indian script and is the third largest understandable language in the world, especially in the subcontinent but fewer efforts are made to make it understandable to computers. Lot of work has been done in the field of literature and Islamic studies in Urdu, which has to be computerized. In the proposed system individual characters are recognized using our own proposed method/ algorithms. The feature detection methods are simple and robust. Supervised learning is used to train the feed forward neural network. A prototype of the system has been tested on printed Urdu characters and currently achieves 98.3% character level accuracy on average .Although the system is script/ language independent but we have designed it for Urdu characters only. Keywords—Algorithm, Feed Forward Neural Networks, Supervised learning, Pattern Matching.
منابع مشابه
Handwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method
A recurrent neural network (RNN) has been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nastaleeq scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have pr...
متن کاملUnconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks
Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solut...
متن کاملComparative Analysis of Raw Images and Meta Feature based Urdu OCR using CNN and LSTM
Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) Network are used. Both network models are trained on formerly extracted ligature thickness graphs, from which models extract Meta features. These thickness ...
متن کاملRecognition of Printed Urdu Script
This paper deals with an Optical Character Recognition system for printed Urdu, a popular Indian script. The development of OCR for this script is difficult because (i) a large number of characters have to be recognized (ii) there are many similar shaped characters. In the proposed system individual characters are recognized using a combination of topological, contour and water reservoir concep...
متن کاملOptical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network
There are about 300 million people in India who speak Hindi and write Devnagari script. Research in Optical Character Recognition (OCR) is popular for its application potential in banks, post offices, defense organizations and library automation etc. However most of the OCR systems are available for European texts. In this paper, we have proposed a technique for OCR System for different five fo...
متن کامل